class: center, middle, inverse, title-slide # EAES 494: Unit 1 - 3 Review ##
Course Feedback & Review Exercises ### Gavin McNicol ### 2021-10-06 --- class: middle # Survey Results --- ## Goals of Group Exercises .pull-left-wide[ **Practice Data Visualization and Data Wrangling** **Get used to collaborative work** *On Rstudio Projects sharing the same GitHub Repository. ] --- ## Exercise Format - Form groups (max 5 per group) --- ## Exercise Format - Form groups (max 5 per group) - All members clone `unit2-review-exercises-GROUP#` into RStudioCloud --- ## Exercise Format - Form groups (max 5 per group) - All members clone `unit2-review-exercises-GROUP#` into RStudioCloud - We will provide a numbered list prompt. For example: 1. Select the first 3 variables in the `penguins` data frame 2. Filter for the `Gentoo` penguin species 3. Pass the output to ggplot() 4. Create a scatter plot between `bill_length_mm` and `flipper_length_mm` 5. Visualize the different `species` with different shapes --- ## Exercise Format - Form groups (max 5 per group) - All members clone `unit2-review-exercises-GROUP#` into RStudioCloud - We will provide a numbered list prompt. - Pick someone to go first (next birthday?) 1. **If it's your turn:** Write code for the first item. Knit, commit *with a message*, and push! - **(Suggestion: "completed exercise 1.1" for Exercise 1 bullet 1)** 2. **Everyone else:** pull the changes from GitHub. 3. **Next person clockwise:** Write code for the next item. Knit, commit and push! 4. Keep going until you complete the exercise. Check it runs! 5. For the next exercise, rotate to a new "first person". - NOTES: Help the person who is making, committing, and pushing changes. - At least one person must be connected to the Zoom (to share screen later) - If you are joining remotely, work on an `unit2-review-exercises-GROUP#` independently --- ## Data: Lake Vida (Antarctica) Sand Dataset <img src="data:image/png;base64,#img/lake-vida-sites.png" width="40%" style="display: block; margin: auto auto auto 0;" /> Help page: Joey! Four data frames (which we will wrangle, join, pivot, and visualize!): 1. `sand_minerals`: mineral types found in sand grains across 10 sites 2. `density`: the density of each mineral 3. `solubility_hardness`: the solubility and hardness of each mineral 4. `location_descriptions`: text descriptions of each location --- ## Warm-up 1. **Everyone:** Open up starter file 1. **First person:** Update YAML with group # (and name if you want) and date 2. **Everyone else:** Pull the changes and take a look **Remember:** - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person (clockwise): Edit, knit commit push... --- ## Exercise 1: Inspect the first data frame Under Exercise 1 in your starter file: 1. Delete existing text, insert and label a code chunk named `inspect-sand-minerals` - (Edit, knit, commit, push) 2. Add a line of code that outputs a "glimpse" of the `sand-minerals` data ;) 3. Add text narrative below the chunk stating the number of rows and variables 4. Add text stating what each row is (exactly!) 5. Add a second line of code within the original code chunk that outputs the variable names Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 2: Inspect the other three data frames Under Exercise 2 in your starter file: 1. Delete existing text, insert and label a code chunk named `inspect-other-dataframes` 2. Add three line of code that outputs a "glimpse" of the other three data frames ;) - `density`, `solubility_hardness` and `location_descriptions` 3. Add text narrative below the chunk stating the number of rows and variables for each data frame 4. Add text stating what each row is (exactly!) in each data frame 5. Add a fourth, fifth, and sixth line of code within the original code chunk that outputs the variable names Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 3: Practice joining Under Exercise 3 in your starter file: 1. Delete existing text, insert and label a code chunk named `join-mineral-properties` 2. Write a pipeline to to join `density` to `sand_minerals`, retaining all rows in `sand_minerals` 3. State in narrative below the chunk how many rows the output has. Is it the same as `sand_minerals`? Assign the output to a new data frame object named `sand_mineral_density` 4. Write a second pipeline beneath the last one to join `solubility_hardness` to `sand_mineral_density`, retaining all rows in `sand_mineral_density` 5. Assign the output to a new data frame object named `sand_mineral_properties`. Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 4: Single data frame wrangling Under Exercise 4 in your starter file: 1. Delete existing text, insert and label a code chunk that outputs the number of sand grains observed per location (starting with `sand_mineral_properties`) 2. Insert and label a second chunk to slice out just the bottom 10 rows of the same data frame 3. Insert and label a third chunk to arrange by density in descending order and then slice out just the bottom 10 rows of the same data frame 4. Insert and label a fourth chunk to select for the first and last column, and then slice out just the bottom 5 rows of the data frame 5. Insert and label a final chunk to filter to *remove* NAs in the `hardness` column, arrange by density in descending order, then slice out just the top 5 rows of the data frame Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 5: Single data frame wrangling Under Exercise 5 in your starter file: 1. Delete existing text, insert and label a code chunk that outputs the distinct values of the column `density` 2. Insert and label a second chunk that outputs that counts the number of each `minerals` type observed 3. Insert and label a third chunk that adds a new variable called `density_1000` equal to the density column divided by 1000 4. Insert and label a fourth chunk that outputs a summary table showing the mean of the `density` values *across all rows* 5. Insert and label a final chunk that outputs a summary table showing the mean of the `density` values *grouped by location* Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 6: Pivoting data Under Exercise 6 in your starter file: 1. Delete existing text, insert and label a code chunk takes `sand_minerals` as input and pivots the data frame *wider* 2. Add a new line of code to the chunk that pivots the data frame *longer* again 3. Insert and label a second chunk that takes `sand_mineral_density` as input and pivots the data frame *wider where each new column is a mineral type and each row is a density* Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- class: middle # Part II - Data Visualization <img src="data:image/png;base64,#img/grammar-of-graphics.png" width="60%" style="display: block; margin: auto auto auto 0;" /> --- ## Data: starwars {dplyr} .pull-left[ Help page and data dictionary: `?starwars` ``` ## Rows: 87 ## Columns: 14 ## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or… ## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2… ## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.… ## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N… ## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "… ## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",… ## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, … ## $ sex <chr> "male", "none", "none", "male", "female", "male", "female",… ## $ gender <chr> "masculine", "masculine", "masculine", "masculine", "femini… ## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T… ## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma… ## $ films <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return… ## $ vehicles <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp… ## $ starships <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",… ``` ] --- ## Exercise 7: Histograms Under Exercise 7 in your starter file: 1. Delete existing text, insert and label a code chunk called `starwars-heights` and pipe `starwars` into `ggplot()` 2. Map `height` to the x-axis aesthetic and add a histogram geometry layer 3. Add a plot labels layer with `labs()`, including an x and y axis, and title 4. Map `species` the to fill aesthetic and add a new label to argument for `fill` within `labs()` 5. Facet the plot by `species`, using either facet function 6. Add a preset theme layer. These should be suggested to you as you type `+ theme...` 7. Add text narrative describing the shape, center and spread. You can use inline code and relevant functions (e.g., `mean()`) to add detail to your description. Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 8: Scatter plots with smooth fits Under Exercise 8 in your starter file: 1. Delete existing text, insert and label a code chunk called `weight-vs-height` and pipe `starwars` into `ggplot()` 2. Make a scatter plot of `height` (x axis) against `weight` (y axis) 3. Add a plot labels layer with `labs()`, including an x and y axis, and title 4. Add a line of code between the first line, and the ggplot() line that filters to **remove** characters with `mass` over 1000 kg 5. Add a smooth moving average line with `geom_smooth()` and facet by `homeworld` Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- ## Exercise 9: Barplots Under Exercise 9 in your starter file: 1. Delete existing text, insert and label a code chunk called `homeworld-barplot` and pipe `starwars` into `ggplot()` 2. Make a barplot plot where `homeworld` is on the x axis 3. Add a plot labels layer with `labs()`, including an x and y axis, and title 4. Map `gender` to an aesthetic such that each bar is split into two colors based on gender 5. Add an argument within the bar plot geom function that allows *better comparison of proportions* 6. Change the code chunk so that each bar on the x-axis is a gender, the y-axis is *no longer a proportion*, and the plot is faceted by `homeworld` 7. Add text narrative describing the character gender split across each homeworld. Remember: - 1 person: Edit, knit, commit, push - Everyone: Pull changes and inspect - Next person: Edit, knit commit push... --- .center[ .large[ This class content was built from the Data Science in a Box source materials. https://datasciencebox.org/index.html ] ]